# FP8 quantization

Bielik 4.5B V3.0 Instruct FP8 Dynamic
Apache-2.0
This model is the FP8 quantized version of Bielik-4.5B-v3.0-Instruct, utilizing AutoFP8 technology to quantize weights and activations into FP8 data type, reducing approximately 50% of disk space and GPU memory requirements.
Large Language Model Other
B
speakleash
74
1
Bielik 1.5B V3.0 Instruct FP8 Dynamic
Apache-2.0
This is an FP8 dynamic quantization version based on the Bielik-1.5B-v3.0-Instruct model, adapted for vLLM or SGLang inference frameworks. It uses AutoFP8 quantization technology to reduce parameter bytes from 16-bit to 8-bit, significantly lowering disk space and GPU VRAM requirements.
Large Language Model Other
B
speakleash
31
1
Qwen3 30B A3B FP8 Dynamic
Apache-2.0
Qwen3-30B-A3B-FP8-dynamic is an FP8 quantized version of the Qwen3-30B-A3B model, significantly reducing memory requirements and computational costs while maintaining the high accuracy of the original model.
Large Language Model Transformers
Q
RedHatAI
187
2
Qwen3 8B FP8 Dynamic
Apache-2.0
Qwen3-8B-FP8-dynamic is an optimized version of the Qwen3-8B model through FP8 quantization, significantly reducing GPU memory requirements and disk space usage while maintaining the original model's performance.
Large Language Model Transformers
Q
RedHatAI
81
1
Qwen3 32B FP8 Dynamic
Apache-2.0
An efficient language model based on Qwen3-32B with FP8 dynamic quantization, significantly reducing memory requirements and improving computational efficiency
Large Language Model Transformers
Q
RedHatAI
917
8
Mistral Small 3.1 24B Instruct 2503 FP8 Dynamic
Apache-2.0
This is a 24B-parameter conditional generation model based on the Mistral3 architecture, optimized with FP8 dynamic quantization, suitable for multilingual text generation and visual understanding tasks.
Safetensors Supports Multiple Languages
M
RedHatAI
2,650
5
QWQ 32B FP8
Apache-2.0
QwQ-32B-FP8 is the FP8 quantized version of the QwQ-32B model, maintaining nearly the same accuracy as the BF16 version while supporting faster inference speed.
Large Language Model Transformers
Q
qingcheng-ai
144
6
Qwq 32B FP8 Dynamic
MIT
FP8 quantized version of QwQ-32B, reducing storage and memory requirements by 50% through dynamic quantization while maintaining 99.75% of the original model accuracy
Large Language Model Transformers
Q
nm-testing
3,895
3
Qwq 32B FP8 Dynamic
MIT
FP8 quantized version of QwQ-32B, reducing storage and memory requirements by 50% through dynamic quantization while maintaining 99.75% of the original model accuracy
Large Language Model Transformers
Q
RedHatAI
3,107
8
Flex.1 Alpha Fp8
Apache-2.0
Flex.1-alpha-Fp8 is the safetensors format version of the Flex.1-alpha model with float8_e4m3fn weights, suitable for text-to-image generation tasks.
Text-to-Image English
F
gmonsoon
225
5
SD3.5 Large Fp8
Other
FP8 quantized version of Stable Diffusion 3.5 Large for text-to-image generation tasks.
Image Generation
S
dyedd
88
2
Llama 3.2 1B Instruct FP8
FP8 quantized version of Llama-3.2-1B-Instruct, suitable for multilingual business and research applications, with performance close to the original model.
Large Language Model Safetensors Supports Multiple Languages
L
RedHatAI
1,718
3
Llama 3.2 3B Instruct FP8 Dynamic
FP8 quantized version of Llama-3.2-3B-Instruct, suitable for multilingual commercial and research purposes, particularly ideal for assistant-like chat scenarios.
Large Language Model Safetensors Supports Multiple Languages
L
RedHatAI
986
3
Meta Llama 3.1 70B FP8
FP8 quantized version of Meta-Llama-3.1-70B, suitable for multilingual business and research applications, with both weights and activations quantized to FP8 format, reducing storage and memory requirements by approximately 50%.
Large Language Model Transformers Supports Multiple Languages
M
RedHatAI
191
2
Meta Llama 3.1 8B FP8
FP8 quantized version of Meta-Llama-3.1-8B, suitable for multilingual business and research applications.
Large Language Model Transformers Supports Multiple Languages
M
RedHatAI
4,154
7
Meta Llama 3.1 70B Instruct FP8
FP8 quantized version of Meta-Llama-3.1-70B-Instruct, suitable for multilingual commercial and research purposes, especially ideal for assistant-like chat scenarios.
Large Language Model Transformers Supports Multiple Languages
M
RedHatAI
71.73k
45
Meta Llama 3.1 8B Instruct FP8
FP8 quantized version of Meta-Llama-3.1-8B-Instruct, suitable for multilingual business and research applications, specially optimized for assistant-like chat scenarios.
Large Language Model Transformers Supports Multiple Languages
M
RedHatAI
361.53k
42
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase